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1-^ ' Abstract 

r? ' Building biological models by inferring functional dependencies from experimental data is an im- 

portant issue in Molecular Biology. To relieve the biologist from this traditionally manual process, 
various approaches have been proposed to increase the degree of automation. However, available ap- 
proaches often yield a single model only, rely on specific assumptions, and/or use dedicated, heuris- 
tic algorithms that are intolerant to changing circumstances or requirements in the view of the rapid 
progress made in Biotechnology. Our aim is to provide a declarative solution to the problem by ap- 
peal to Answer Set Programming (ASP) overcoming these difficulties. We build upon an existing 
approach to Automatic Network Reconstruction proposed by part of the authors. This approach has 
firm mathematical foundations and is well suited for ASP due to its combinatorial flavor providing 
[7^ I a characterization of all models explaining a set of experiments. The usage of ASP has several ben- 

efits over the existing heuristic algorithms. First, it is declarative and thus transparent for biological 
experts. Second, it is elaboration tolerant and thus allows for an easy exploration and incorporation 
of biological constraints. Third, it allows for exploring the entire space of possible models. Finally, 
our approach offers an excellent performance, matching existing, special-purpose systems. 



H , 1 Introduction 

The creation of biological models by inferring functional dependencies from experimental 
data is a key issue in molecular biology. A common approach is to construct descriptive 
models from series of experiments. This (manual) process usually starts from a model 
defined using existing biological knowledge which is then gradually refined by appeal to 
data gathered in subsequent experiments. A model obtained this way is however merely 
consistent with the gathered experimental data, and, besides simulation, no true indication 
can be given how well the resulting model captures the biological system. For instance, 
it is unclear whether the obtained model is one among many or few alternative models. 
Moreover, it is of great interest to know the difference among alternative models in order 
to design new experiments for further discriminating the best fitting model. 

This problem is addressed in the area of Automatic Network Reconstruc- 
tion (ANR) JGifford and Jaakkola 20011 |Yeung et al. 2002} [Liang et al. 1998 
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Repsilber et al. 2002)l. However, the available approaches often yield a single model 



only, rely on specific assumptions, and/or use dedicated, heuristic algorithms for con- 
structing a model from experimental data. Moreover, all these approaches are intolerant 
to changing circumstances or requirements in the view of the rapid progress made in 
Biotechnology. Unlike this, we provide a declarative solution to the problem by appeal to 
Answer Set Programming (ASP; jBaral 2003l l). To this end, we build upon the approach 
to ANR proposed in (Durzinsky et al. 2010 IMarwan et al. 20081) . This approach has 



firm mathematical foundations and is well suited for ASP due to its combinatorial flavor 
providing a characterization of all models explaining a set of experiments. The usage of 
ASP has several benefits over the existing heuristic algorithms. First, it is declarative and 
thus transparent for biological experts. Second, it is elaboration tolerant and thus allows 
for an easy exploration and incorporation of biological constraints. Third, it allows for 
exploring the entire space of possible models. Finally, our approach offers an excellent 
performance, matching existing, special-purpose systems. 

The next section gives a formal introduction to ANR, as provided in 
( [Durzinsky et al. 2010[ IMarwan et al. 2008J . followed by a brief introduction to ASP in 
SectionO Section|4]is dedicated to our solution to ANR in ASP. We empirically evaluate 
our approach in Section |5] and conclude with a discussion and a summary in Section |6] 
andlll 

2 Automatic Network Reconstruction 

Automatic Network Reconstruction aims at constructing all models explaining a set of 
(perturbation) experiments reflecting a certain biological process. Our approach starts from 
experimental time-series data and generates all interaction networks that account for the 
observed mass or signal flow. We briefly describe the steps of this approach proposed in 
( [Durzinsky et al. 20 10[ [Durzinsky et al. 2008[ IMarwan et alTTO OS ). 



We represent a collection S' of n observable species as a vector (si , . . . , s„) being con- 
sidered to be crucial for describing the studied biological phenomenon, along with a cor- 
responding vector {Di , . . . , £>„) of associated capacities over INq- Accordingly, species Si 
is assigned a value from capacitiy Di for 1 < i < n. A state x of species (si, . . . , s„) is 
a vector (xi, . . . , a;„) such that Xi G Di for \ < i < n. Thus, Xi provides the value of 
species Si in state x for 1 < i < n. Note that our concept of a state is only partial because 
it is confined to the observable species in S. In what follows, we leave the set S of species 
implicit whenever clear from the context. 

A (perturbation) experiment £{x^) = (x"; x^, . . . , x^) over 5 is a sequence of states 
reflecting the time-dependent response (x^, . . . , x'') of a biological system to a (specific) 
perturbation of the system in state x°. We associate with each response state x' £ £{x^) 
its terminal state x^ G £{x^) and define t(x*) = x^ for all 1 < i < fc. 

Typically, several experiments E{x^) starting from different initial states x° are neces- 
sary to describe a biological phenomenon. We encode a set £ of different experiments in 
terms of an experiment graph G{£) — {X, Ep U Eft) over S, which is a directed graph 
such that X is the multi-set of states in £, and Ep and En are disjoint sets of perturba- 
tion and response edges, respectively. That is, for each £{x^) = (x°; x^, . . . , x'') e £, we 
have (x°,x^) e Ep and (x%x*+^) e En for 1 < i < fc. For illustration, consider Fig- 
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Fig. 1. An Experiment Graph G(f). 

ure [U showing an experiment graph over species {/r, r, spo^, encoding three experiments 
8{x°) = (x0;xi,...,x4), £:(x2) = (x^; a;^, a;0), and ^(x^) = (a;3; a;^ ... ,a;8). The en- 
tries in each state vector give the respective values of each species; continuous arrows 
represent response edges, dashed ones give perturbation edges. 
An experiment graph G{£) = (X, Ep U Ep) is valid if 

I. every state x & X has at most one outgoing arc in Ep, 
II. X = x' implies t{x) = t{x') for all x^x' ^ X and 

III. {x' - x) ^ IN" holds for all (x, x') e Er, 

Condition I stipulates that an experiment graph is deterministic, while II requires that no 
equalJ states lead to different terminal states. HI demands that there must be at least one 
species that decreases between two consecutive response states. In fact, the experiment 
graph in Figure [T] violates two validity conditions: II is violated through states x^ and x^, 
as these states are equal but lead to differing terminal states a;° and x^, respectively. Ill is 
violated by response edge (x^, x^). 

For the reconstruction, we use the paradigm that system states can be changed by ap- 
plying reactions. A reaction over n species is described by a vector r G Z", where ri < 
for some 1 < i < n. So, a reaction must have at least one negative entry to consume 
at least one species. A reaction r is enabled in a state x over n species with capacities 
(Z?i, . . . , Dn), if we have x, + ri G Dt for all 1 < i < n, i.e. if neither nonnegativity 
nor capacity constraints are violated. For instance, reaction r — (0, —1, 0) is enabled in x® 
because x^ + r — (0, 0, 0) belongs to the species' capacity. 

Given an experiment graph {X, Ep U Ep) and a response edge (x, x') G Ep, we say 
that this response is reaZ/zeii by a sequence (7 ((x,x')) = (r^,...,r') of reactions, if 

IV. if + r' = 2y'+i for all 1 < i < ;, and 

V. (y^, y^, . . . , y'+^) is a sequence of states such that x — y^ and x' = y'^^, 
VI. rl-rl>0 for all 1 < i, j < ^ and all < fc < n. 

All reactions subsequently applied to state x fulfill the response edge and ultimately lead 
to the consecutively observed state x' in Ep. For example, the reaction r — (0, —1, 0) 
constitutes a singleton sequence (j{{x^,x'^)) as x^ + r = x^ realizes (x^,x^). Note 

^ Recall that X is a multi-set; two states are equal if their vector of species is equal 
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that VI stipulates that all reactions in such a sequence must be monotone]^ at micro- 
scopic level, a species cannot be produced and consumed (or vice versa) by two reactions, 
see ( Durzinsky et al. 2(J08) l for details. 



To also account for the experimentally observed mass or signal flow, 
(IMarwan et al. 20()8] l propose to use a partial order on the set of reactions to reflect 
their relative rates. A sequence (r^, . . . , r') of reactions is said to respect such a partial 
order -<, if r' is the unique -<-minimal reaction enabled in an (intermediate) state y^ for 
each 1 < i < I. Note that the reaction order -< must be sufficiently strong to guarantee a 
unique fastest reaction at each step. This implies for each state to have a unique successor 
state, ensuring the system's determinism. 

Following (IMarwan et al. 20()8] l. a regulatory structure (TZ, -<) over species S consists 
of a set of reactions 7?. and a partial order -< among themo A regulatory structure {TZ, ^) 
is conformal with a valid experiment graph {X, Ep U Ep), if 

VII. for all r G 7?., r is not enabled in any terminal state of X, 
VIII. for all e e Ep, there is a ^-respecting realizing sequencq3cr(e) C TZ, and 
IX. there exists no r G 7?. where r is not an element of some (j{e). 



As defined in ( Durzinsky et al. 2010 1, the Network Reconstruction Problem for a valid 
experiment graph consists in finding all regulatory structures conformal with the graph. 

An invalid experiment graph can be recovered by adding new, artificial species to 
50 Given an (invalid) experiment graph {X,Ep U Ep), an extension {X',Ep U 
Ep) with a species is obtained by replacing each state {xi, . . . ,Xn) in X with 
(xi, . . . , Xn, a^n+1, • • • , Xn+a) such that Xn+i G {0, 1} for 1 < i < a; all other capaci- 
ties and edges are left intact. Note that an experiment graph has 2° extensions. 

An extension {X', Ep U Ep) of an experiment graph with a species is valid, if 

X. {X' , Ep U Ep) is a valid experiment graph and 
XI. Xn+i = a^Ji+i for each {x, x') G Ep and 1 < i < a. 

The latter condition stipulates that additional species are not direct targets of experimental 
perturbations, but they certainly respond in successive states. Similarly, we want to reduce 
the changes of additional species in response edges: A response edge {x,x') G Ep is 
subject to an additional change, if a;„-(-j ^ x[-^+j for some 1 < j < a. 

At last, given an invalid experiment graph, the Network Reconstruction Problem consists 
in solving the NRP for all valid extensions of that graph, first, adding a minimum number 
of additional species and, second, comprising a minimum number of additional changes. 
For brevity, such extensions are called minimal valid extensions. 

Figure |2] and |3] show the twqfl valid extensions of the invalid experiment graph in Fig- 
ure [T] The nodes in the figures are the vectors from Figure [Uextended by the two additional 
species. Both extensions differ in the values attributed to the two additional species in state 

^ This is a significant constraint on the quality of time series data. The response of the system must be measured 
with sufficient time resolution, such that oscillation between measurements can be excluded. 

^ 7?- is also referred to as a network because such reaction sets are easily converted to Petri nets, as done 
in ( Marwan et al. 2008). This is however beyond the scope of this paper 

* We slightly abuse notation, and take cr(e) C 7^ to mean that each element of o"(e) is also in TZ. 

^ This allows for differentiating seemingly equal yet different states, enabling new reactions by decreasing addi- 
tional species, or avoiding reactions in terminal states. 

^ Actually, there are four extensions with symmetric behavior on the additional species. 
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Fig. 2. First Extension G{£i ) of the Experiment Graph G{£) in Figure[T] 




Fig. 3. Second Extension G(f 2) of the Experiment Graph G{£) in Figure[T| 



X2 and 2:5. The extensions are conformal with the regulatory structures with the networks 
depicted in Figure 2] and |5] respectively. The additional species are referred to as x and 
y. Reactions are given as boxes, species as circles. All reaction entries have the capacity 
{ — 1, 0, 1}. An arrow from a species to a reaction stands for a —1 in the reaction vector, 
one from a reaction to a species stands for +1. 

Accordingly, the regulatory structure in Figure |4] over species {fr,r, spo,x,y) com- 
prises the following reactions: r^ = ( — 1,0,0,1,0), r^ — (0,-1,0,0,0), r^ = 
(0,-1,0,-1,0), r'' = (0,0,0,-1,1), and r^ = (0, 0, 1, 0, -1) and the ordering ^= 
{(r**, r"^), (r'^, r^), (r^, r^)}. The regulatory structure in Figure |5] comprises the reactions: 
ri ^ (-l,0,0,l,l),r2 = (0,-1,0, -1,-1), r3 = (0, -1, 0, 0, 0), r^ = (0,0,1,-1,0), 
andr^ = (0,0,0,0,-1) and the ordering -<= {{r'^,r^), {r^,r^), (r'^,r^), (r^,r^)}. 



3 Answer Set Programming 



We rely on the input language of the ASP grounder gringo JGebser et al. I l (extending the 
language of Iparse ( Syrjanen} ) and introduce only informally the basics of ASP A com- 
prehensive, formal introduction to ASP can be found in ("Bara l 20031 IGelfond 20081 1. 
We consider extended logic programs as introduced in (.Simons et al. 2002] |. A rule r is 
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Fig. 4. Regulatory Structure conformal with the extended Experiment Graph in Figure|2l 
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Fig. 5. Regulatory Structure conformal with the extended Experiment Graph in Figure [3] 



of the following form: 



H 



Bl, . ■ . , Bm, ~i?m+l, 



^Bn. 



By head{r) = H and body{r) = {Bi, . . . , Bm, ^B 



m+i, ■ • ■ , ^Bn], we denote the head 
and the body of r, respectively, where "'^" stands for default negation. The head H is an 
atom a belonging to some alphabet A, the falsum _L, or a ^^sum constraint L 7^sum[i?i = 
Wi, . . . ,ik = Wk] U. In the latter, £i ~ a^ or £i = r^ai is a literal and Wi a non-negative 
integer weight for ai ^ A and 1 < i < k; L and U are integers providing a lower and an 
upper bound. Either or both of L and U can be omitted, in which case they are identified 
with the (trivial) bounds and oo, respectively. Whenever all weights equal one, the ^sum 
constraint L #suin{£i = 1, . . . ,£k — 1}U becomes a "^count" constraint and is simply 
written as L {£i, . . . , £k} U . A rule r such that head{r) = ± is an integrity constraint. Each 
body component Bi is either an atom or a 7^ sum constraint for 1 < i < n. If body{r) — 0, 
r is called a fact, and we skip "■(— " when writing facts below. We adhere to the definition of 
answer sets provided in JSimons et al. 2002J . which applies to logic programs containing 
extended constructs (=i^suiii constraints) under "choice semantics". 

In addition to rules, a logic program can contain ^minimize statements of the form 

^minimize { £1 — wi, . . . ,£k = Wk }■ 

Besides literals £j, a ^minimize statement includes integer weights Wj for 1 < j < k. A 
^minimize statement distinguishes optimal answer sets of a program as the ones yielding 
the smallest weighted sum for the true literals among £1, . . . ,£k- For a formal introduction, 
we refer the interested reader to (ISimons et al. 2002]) . 

Likewise, first-order representations, commonly used to encode problems in ASP, are 
only informally introduced. In fact, gringo requires programs to be safe, that is, each vari- 
able must occur in a positive body literal. Formally, we only rely on the function ground 
to denote the set of all ground instances, ground{lV), of a program 11 containing first-order 
variables. Further language constructs of interest, include conditional literals, like "a : b", 
the range and pooUng operator ".." and ";" as well as standard arithmetic operations. The 
":" connective expands to the list of all instances of its left-hand side such that corre- 
sponding instances of literals on the right-hand side hold ( Syrjanen IGebser et al. ). While 
".." allows for specifying integer intervals, ";" allows for pooUng alternative terms to be 
used as arguments within an atom. For instance, p(1..3) as well as p(l; 2; 3) stand for the 
three facts p(l), p(2), and p{3). Given this, q{X) : p{X) results in q(l), q{2),q{3). See 
(IGebser et al. I l for a detailed description of the input language of the grounder gringo. 
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4 Declarative Automatic Network Reconstruction 

Our approach expresses the ANR Problem in form of a logic program under answer set 
semantics. In the next sections, we explain the encoding in detail, starting with the repre- 
sentation of the experiment graph. 



4.1 Representing experiment instances 

As common in ASP, a problem instance is given as facts. We define the instance of an 
experiment graph (X, Ep U En) over S with domain D, as a set of facts 

I{X, Ep U Ep) ^ {species{s) \ s e S} 

U { capacity {si,di) \ Si E S and di G D} 

U {state{x) \x eX} 

U {edge{p,x,x') \ {x,x') G Ep} 

U {edge{r,x,x') \ {x,x') G Ep} 

U {terminalState{x') \ x' — t{x),x G X} 

U {value{x, Si, Xi) \ x — {xi, . . . , a;„) € X,l < i < n}. 

Predicates species{si) and capacity {si, di) denote the species Si G S over their associated 
capacities di G D. Similarly, we use state{x) to denote each state x E X and mark termi- 
nal states with predicate terminalState{x). For the edges of the graph, we use predicate 
edge{T, x, x'), where T can be either p or r to indicate that {x, x') is a perturbation or a 
response edge, respectively. 

As an example, Table[T]gives the specification of the experiment graph in Figure[TJ 

species{fr). species(r). species(spo). 

capacity {fr,0..1). capacity {r,0..1). capacity (spo,0..1). 

$tate{x'^). state{x^). state (x^). state{x^). 
state{x'^). state{x^). state{x^). state{x''). state{x^). 



edge{j),x^ ,x^). edge{p,x^,x''). edge{p,x^,x^ 
{r,x^,x 
{r,x^,x 



^ "-^"^ edge(r,x^,x^). edge{r,x^ jX*') 



^ "■^^ edge{r,x^,x^). edge{r,x'^ ,x^). 



terminalState{x'^). terminalState{x'^). terrainalState(x^). 

value{x^^, fr,0). value{x^, fr,l). value {x^, fr,0). value{x^, fr,0). 
value{x^^,r,0). value{x^ ,r,0). value {x^,r,0). value{x^,r,0). 

value{x'^,spo,0). value{x^ ,spo,0). value{x^,spo,0). value{x^,spo,0). 

value{x*, fr,0). value{x^, fr,0). value{x^', fr,0). value{x'^ , fr,Q). value{x^, fr,0). 
value{x*,r,0). value{x^,r,l). value{x^',r,l). value{x''^ ,r,0). value{x^,r,0). 
value{x'^,spo,l). value{x^,spo,0). value{x^',spo,0). value{x'^ ,spo,Q). value{x^,spo,l). 

Table 1 . Specification of Experiment Graph in Figure [2 

Given the instance in Table [T] the solutions of our final logic program represent all 
regulatory structures that are conformal with the extensions of the experiment graph. We 
start with checking the vahdity of the experiment graph. 
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4.2 Checking Validity 

In Section|2] three conditions were specified for validity. Checking these conditions can be 
done with the following logic program. 

Condition I, ensuring that each state has only one outgoing arc, is given in ([T]i- 

^edgeir,Xi,X2),edgeir,Xi,X3),X2^X3. (1) 

Rules (|2]) to (|5]) account for Condition II. We first collect all pairs of states that are not 
equal and then compute the associated terminal state of each state which can be determined 
deterministically. Rule ^ ensures that no two equal states lead to unequal terminal states. 

neq{Xi,X2) <^ value{Xi, S,V),^value{X2, S,V), state{X2)- (2) 

t{X,T) ■<— edge(r, X,T),terminalState{T). (3) 

i(Xi,T) ^ edge{r,Xi,X2),t{X2,T). (4) 

^^neq{Xi,X2),neq{Ti,T2),t{Xi,Ti),t{X2,T2). (5) 

The rules in Q and ^ ensure a decrease in each response as required in Condition III. 

decrease {Xi ) <— edge (r, Xi , X2 ) , value {Xi , S,Vi), value {X2 , 5, V2 ) , 

(6) 
F2 - V^i < 0. 

•f- ^decrease(Xi), edge{r, Xi, X2). (7) 

The next proposition ensures correctness and completeness of this logic program. 

Proposition 1 

Let {X,Ep U Eij) be the experiment graph and Hi be the logic program 

ground{I{X, Ep U Er) U {O, ■ • • , ©}). 

Then, the experiment graph {X, Ep U Ep) is valid iff there exists an answer set of Hi. 

The proof of this and all following results follow from the construction of the respective 
logic programs. Recall that the experiment graph in Figure[T]is invalid, so the correspond- 
ing program has no answer set. 

4.3 Building Regulatory Structures 

We now proceed by defining a finite logic program that allows us to find all regulatory 
structures conformal with a valid experiment graph. 

For guaranteeing the finiteness of the ground program, we need to know the maximum 
number of reactions that shall be used. As each reaction has to consume at least one species, 
the number of reactions is bound by the total number of decreases during a response edge. 
Although there exist better approximations, for simplicity, we define the logic program 
B{X,Ep U Ep) that results in the single answer set A, such that maxAdds{n) G A, 
maxReactions{m) S A and length{x, x' , rn) £ A for all {x, x') G Ep, where n, m G INq 
are sufficient bounds. Given these bounds, we now choose a certain number of reactions in 
do) to be part of the regulatory structure. 

r(l). (8) 

{ r{N+l) } ^ r{N), maxReactions{M), N<M. (9) 



Network Reconstruction in ASP 9 

For the resulting reaction vector, we pick out its values in (fTOb . So, for each reaction and 
each species s,; e S, we choose exactly one value from the set {d, —d \ d £ D,}. This is 
constrained by ( fTTT i ensuring that each reaction has at least one negative entry. 

1 { r{R,S,V) : capacity {S , V) , 

r{R, S, ~V) : capacity {S , V)} 1 <— r{R), species{S). 

^ {r{R,S,V) -.V <0}0, r{R). (11) 

Next, we want to define the realizing sequences guessing a partial order -< of reactions. 
Therefore, we define the intermediate states of a sequence. Each consecutive intermedi- 
ate state is built, adding the currently fastest enabled reaction. As with the reactions, we 
first guess the length of the sequences <t{x, x') in ( fT2] i and (fTsT l. which is bound by the 
precomputed predicate length{x, x' , .). 

step{Xi,0)^ edge{r,Xi,X2). (12) 

{ step{Xi,N + l) } <- step{Xi,N),length{Xi,X2,M),N < M. (13) 

Then, the value of the species is defined for each intermediate step by Rule ( fl4l i and (fTsT i 
by adding the fastest reaction, as stated in Condition FV. 

interValue{0, X, S, V) ^ value{X, S, V). (14) 

interValue{L + 1, X, S, Void + VNew) ^ interValue{L , X, S, Void), 

fastest{R, L, X), r{R, S, Vncw), 
capacity {S, Void + Vncw), 
step{X, L). 

Rules ( fTSI l and ( flTl l find out which reaction is enabled in which intermediate state. 

disabled(R, L, X) i— interValue{L, X, S,Vi), r{R, S,V2), species{S), 

'^ capacity {S, 14 + V2), step{X, L). 
enabled{R, L, X) i disahled{R, L, X), r{R), step{X, L). (17) 

From the enabled reactions, we freely choose a fastest reaction via Rule dTsl l. 

{ fastest{R, L,X)} ^ enabled{R, L, X). (18) 

To impose an ordering on the reactions. Rule (fT9] l says that if there is another enabled 
reaction different from the fastest one, then this one must be slower 

slower {R2,Ri) <~ fastest{Ri, L, Xi), enabled{R2, L, Xi), Ri ^ R2. (19) 

We just need to add transitivity to the predicate slower, and forbid that a reaction is slower 
than itself to enforce the partial order: 

slower{Ri, R3) 4— slower{Ri, R2), slower{R2, -R3). (20) 

^ slower {R,R),r{R). (21) 

To create a vaUd realizing sequence, the constraint in (l22l i assures that the sequence of 



(15) 



(16) 
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reactions leads to the next measured state of the system, as stipulated in V. 

^ interValue{M + l,Xi,S,V), edge{r, Xi,X2),^value{X2, S, V), 
step{X, M), r^step{X, M + 1), species{S). 

Similarly, (|23] | and JTM enforce that reactions are monotone, as dictated by Condition VI. 
This is expressed by saying that a reaction applying in a state x must increase/decrease the 
species into the direction of the next state x', where {x, x') E Ej^. 

^ fastest{R, L, Xi), r{R, S, V3), edge{r, Xi, X2), 

value{Xi, S,Vi), value{X2,S, V2) , (¥2 - Vi) * V3 < 0. 
^ fastest{R, L, Xi), ^r{R, S, 0), edge{r, Xi, X2), 

value{Xi, S, V), value{X2, S, V). 

Condition VII states that for creating a regulatory structure, no reaction my be enabled in 
a terminal state. This is addressed in (IZSll and i 



disabled{R, X) <— terminalState{X) , r{R, S*, Vi), 

value{X, S, V2), ^capacity{S, Vi + V2). 
^ 1 { '^disabled{R,X) : terminalState{X) }, r{R). (26) 

To avoid irrelevant solutions, every reaction must be used. This is addressed in condi- 
tions VIII and IX. That is, each reaction has to be at least one time the fastest reaction (see 
( l27b ) and in each step there must exist at least one fastest reaction (see (l28b). 

^r{R),^l{fastest{R,L,X)}. (27) 

^ step{X, i), - 1 { fastest{R, L, X) }. (28) 

For formulating our correctness and completeness result, we need the following auxil- 
iary definition. 

Definition 1 

Let (si, . . . , s„) be the vector of the species in S. Let A be a set of ground atoms such that 

r{x) G A and r{x, Si, Vi) G A for 1 < i < n. Then, define 7^(2^) = {vi, ■ ■ ■ ,Vn)- 

We then get the following result. 

Proposition 2 

Let {X,Ep U Eu) be the experiment graph and 112 be the logic program 

ground(I{X, Ep U Er) U B{X, Ep U Ep) U {O, . . . , (ESJ}). 

• If A is an answer set of 112, then the regulatory structure [TZ, <) is, conformal with 
the experiment graph {X,Ep U Ep), where TZ ~ {7^(0;) | r{x) £ A} and -<= 
{(7^(2;), 7A (a;')) I slower{x,x') G A}. 

• If there exists a regulatory structure [TZ, -<) that is conformal with the experiment 
graph {X, Ep U Ep), then there exists an answer set A of logic program 112 such 
that 7^ = {7^(2;) I r{x) £ A} and -<= {(7^(2;), 7^(2;')) | slower{x,x') G A}. 
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4.4 Recovering the Experiment Graph 

We now extend our logic program to recover from an invalid experiment graph. 

Given that the minimum number of additional species is provided via predicate 
maxAdds{M), we start by introducing M additional species and their capacities: 

addSpecies{l..M) ^ maxAdds{M). (29) 

capacity{S, V) i— addSpecies{S), V = 0..1. (30) 

We freely choose a value for the additional species in (ISTT i. Moreover, we declare them as 
ordinary species in ( |32] | in order to subject them to all constraints on species. 

1 { value{X, S,V) : capacity [S^V) } 1 <— addSpecies{S), state(X). (31) 

species{S) -f- addSpecies{S). (32) 

In a valid extension, there may be no change in the additional species during a perturbation, 
according to Condition XI: 

<— edge{p, Xi, X2), value{Xi, S, V), '^value{X2, S, V), addSpecies{S). (33) 

Finally, for minimizing the number of changes in the additional species, captured in (|34| |. 
we use the minimize statement in i 



addChange{Xi) <r~ edge{r, Xi,X2),value{Xi, S,V), 
^value{X2, S, V), addSpecies(S). 
^minimize{ addChange{X) }. (35) 

As before, we get the following correctness and completeness result for this encoding. 

Proposition 3 

Let {X, Ep U En) be the experiment graph and Ila the logic program ground{2{X, Ep U 

Ep) U B{X, Ep U Ep) U {O, . . . , m})- 

• If A is an answer set of Ha being minimal wrt statement around(f35[). the regula- 
tory structure (TZ, -<) is conformal with a minimal valid extensioi{]of the experiment 
graph {X,EpUEp), where 7^ = {7a(x) | r{x) e A} and -<= {(7^(2;), 7a(x')) | 
slower{x, x') £ A}. 

• If (7?., -<) is a regulatory structure that is conformal with a minimal valid extension 
of the experiment graph (X, Ep U Ep), then there is an answer set A of Ha being 
minimal wrt statement around{^35\i) such that TZ = {'^a{x) \ r{x) £ A} and 
^— {(7^1(2;), 7^1(2;')) I slower{x,x') £ A}. 

Given the ground instance / in Table[T] which is the logic representation of the experiment 
graph in Figure [T] The answer sets of 

ground{I U {O, • • • , dSll} U {maxReactions{l2)] U {maxAdds{2)} U 
{length{xi,X2, 1), length{x2, X3, 1), length{x3,X4, 1), 
length{xT, xg, 2), length{xQ, 2:7, 2), length{x5, Xq, 3)}) 

^ Recall that such an extension is minimal wrt to the number of species and changes in the additional species. 
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minimal wrt to the statement ground{^5^) do correspond to the regulatory structures 
shown in Figure |4] and |5] Actually, there are four answer sets with symmetric behavior 
on the additional species. To avoid these, we use a symmetry breaking technique, that is 
explained in the next section. 



4.5 Symmetry breaking 

To avoid symmetric models and to speed up computation, we developed symmetry break- 
ing rules for the additional species and the reactions. The additional species can be freely 
labeled obeying the constraints. Therefore producing 2'^ extensions of the experiment 
graph. Given the conditions that must hold for a valid experiment graph and a conformal 
regulatory structure, the number of extensions is restricted. Despite all that, unnecessary 
extensions can be created, as for each extension in Figure|2]and|3]a mirrored version exists 
where the labeling of the additional species is switched between additional species x and 
y. To overcome this, we define an order on the states|f|The evolution of the added species 
s is considered to "precede" that of added species s + 1, if either s changes and s + 1 not, 
or if s decreases more than s + 1 (increases handled analogously). The omitted models can 
easily be reconstructed by permuting the added species. This is of course not necessary, as 
a biologist has to research the "meaning" of the species. 

Furthermore, we do symmetry breaking on reactions. As we name reactions by numbers, 
we enforce them to respect some order.^ We simply use the reaction vector of each reaction 
to impose a fixed order So the reaction with the identifier 1 always has the "smallest" 
reaction vector This way we omit models that differ only in the naming of the reactions. 

As with the refinements of our encoding, discussed in the next section, the corresponding 
logic programs can be found at JOstrowski 201 11 1. 

4.6 Refinements 

The encoding that we have presented above was optimized for readabiUty. An enhanced 
version optimized for performance can be found on the web (IOstrowski2011l l. Also, it 
contains an optimized possibility to compute the static bounds: the maximum number of 
reactions, additional species and number of possible intermediate states. A basic approxi- 
mation for the number of reactions is the number of negative changes of each species, as 
each reaction has to consume at least one species. This usually results in a high maximum 
number of reactions. As this number is crucial for the systems performance, we first solve 
the problem without checking the partial order of the reactions and maximize the number 
of reactions that shall be used. This computation can be done much faster and gives good 
approximations for the maximum number of reactions. For approximating the number of 
intermediate steps, we have a similar approach, considering only two consecutive states 
and again maximizing the number of reactions that are enabled in between. 

We now describe another interesting optimization, this time for the encoding itself. 
To reduce the size of the strongly connected components of the positive dependency 



' Any total order is valid. 
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graph ("L in and Zhao 20 04^ of the logic program, we are using a complete ordering of the 
reactions instead of a partial one. To this end, we replace (fT9] l with the following rules: 

{ slower{Ri,R2) } ^ r(i?i), r(i?2), i?i 7^ i?2. (36) 

<— fastest{Ri,L,X), enabled{R2, L,X), 

Ri 7^ R2, ^slower{R2, Ri). 

We now freely choose an ordering of the reactions in ( |36] | and then assure in ( l37b that 
each other enabled reaction has been chosen to be slower. In this way, we reduce the size 
of the positive cycles in the dependency graph. Now we are no longer restricted to partial 
orders and the number of different solutions would increase drastically, as each partial order 
implies many orderings. To overcome this issue, we project only on the reaction vectors. 
This means that we compute all (minimal) solutions that differ in the reaction vectors 
but avoid solutions with the same set of reaction vectors but different orderings. This is a 
feature of our solver clasp and can be done very efficiently as shown in (G ebser et al. 2009] l 
without enumerating all solutions. Furthermore, different redundant constraints have been 
added to the encoding. For example, a response may not be enabled in a terminal state, 
reactions that apply in a state must sum up to the difference vector, etc. Without these 
optimizations of the encoding, for instance, we were unable to solve the instance "ip3r-l- 
4-dag" used in the next section. 

5 Experiments 

To test the feasibility of our approach, we used "in-silico'13 experiments generated from 
a synthetic bio-chemical network. Several time series are generated and some combi- 
nations of them are shown in Table |2l From the time series, the values of two species 
were removed to simulate experiments not measuring all species. So the data of 14 
species is used. We compared our ASP approach to the direct implementation described 



in (Durzinsky et al. 2010). Unfortunately, no direct implementation exists that does handle 



the range of constraints described here. Some approaches additionally handle the creation 
of catalysators/inhibitors and others lack the check of the partial ordering on the reactions. 
The implementation that comes closest to ours does not compute the number of additional 
species (it has to be given from outside) and does not do the partial order check. But for the 
feasibility test, we decided to compare with this version, refeiTed to as the "direct imple- 
mentation". We tested it, giving the maximum number of additional species, as computed 
by our approach, as input. The benchmark were run single-threaded on an Intel Xeon ma- 
chine with 32GB main memory possessing two 3,4Ghz processor with eight cores each; 
each benchmark was restricted to 2GB of memory and Ih runtime. For the times, we show 
the average of three runs in seconds. MEM indicates that the memory limit was reached. 
For the ASP approach we use the grounder ^Wn^o (3.0.4) and the solver clasp (1.3.6). We 
tested the unoptimzed version (denoted by "unopt. asp times") of our encoding as well 
as the optimzed version with the refinements from Section l4~6l (denoted by "asp times"). 
The number of models gives the number of different regulatory structures that have been 

^ The data is confidential and was made anonymous by our industrial partner. 
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instances 


ASP 

times 


unopt. 
ASP times 


direct impl. 
times 


add. 
species 


models 


maximum 
of reactions 


states 


exper- 
iments 


ip3r-l 


1.3 


4.5 


0.1 


1 


4 


7 


11 


2 


ip3r-l-dag 


1.1 


4.2 


0.1 


1 


4 


6 


12 


3 


ip3r-2-dag 


0.7 


1.5 


0.1 


1 


2 


4 


11 


4 


ip3r-3-dag 


0.3 


0.5 


0.1 





1 


4 


11 


4 


ip3r-4-dag 


0.5 


1.0 


0.1 


1 


8 


5 


10 


4 


ip3r-l+4-dag 


3.6 


16.4 


0.1 


1 


8 


8 


20 


6 


ip3r-l+2-dag 


34.0 


300.9 


MEM 


2 


44 


10 


21 


6 


ip3r-l+3-dag 


59.0 


750.7 


MEM 


2 


128 


10 


21 


6 


ip3r 


30.2 


244.6 


MEM 


1 


2 


9 


37 


11 


ip3r-l-4 


656.7 


3435.3 


MEM 


2 


104 


11 


37 


11 


ip3r-l-4-dag 


3562.3 


TIME 


MEM 


2 


280 


12 


38 


12 



Table 2. Reconstructing a model using in-silico experiments 

reconstructed by the ASP approach. The number of states (measured time points) and ex- 
periments used for the reconstruction is also given. With "maximum of reactions", we refer 
to the maximum of reactions that are used in the regulatory structures. As it can easily be 
seen in Table |2] the number of experiments (and therefore the number of states) and ad- 
ditional species increases the difficulty of the problem. Our refined approach was able to 
solve all of the problems, which means that it is feasible to run it on the shown number 
of experiments and states, as long as the number of additional species stays low. The di- 
rect iniplementation, also having limited functionality, has severe problems with memory 
usage. [^ On the other hand, it has less initialization overhead on the small examples. 



6 Discussion 

In the area of automatic network reconstruction many different approaches have been de- 
veloped. They differ in the used techniques and the kind of system they reconstruct. Sta- 
tistical methods are used e.g. in JGifford and Jaakkola 200 il l and dPeer et al. 20011) recon- 
structing wiring diagrams using Bayesian network methods. More descriptive systems are 
time continuous deterministic dynamical systems. Using ordinary differential equations. 



( Yeung et al. 2002 1 and ( Laubenbacher and Stigler 2004 1 infer a network by solving a non- 
homogeneous system of linear equations, given a set of experiments. The result is a min- 
imal network in means of the structure of the functions. Enumerating algorithms are used 
in ( [Liang et al. 1998| l and (lAkutsu et al. 1999] l to search for the sparsest Boolean model. 
Boolean networks however are more intuitive but less expressive. Hybrid models seem to 
compensate the drawbacks of Boolean models. ( [Repsilber et al. 2002[ ) uses a genetic algo- 
rithm on gene expression data to produce a hybrid model, including quantitative and quali- 
tative information. Depending on the quality of the available experimental data and the type 
of the studied models, further approaches have been developed; see ( jDurzinsky et al. 2010[ 
purzinsky et al. 2008 Wagler 201 1 1 for a detailed comparison to our underlying approach 
in Section|2] 



' We also tested it with 3GB memory restriction, which did not changed any of the results. 
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Our approach does not try to find the "best" model, because this approach looses in- 
formation about important alternatives that only a biologist can decide on. It rather shows 
all possible models conform with the experiments. We use a hybrid model, incorporating 
several discrete levels of concentrations for the species. This extends the purely Boolean 
approach, but can of course not keep up with the expressiveness of differential equations. 
The modeling as a logic program makes it simple to integrate all the various constraints. 
The approach can easily be extended by further constraints, or constraints can be relaxed. 
Although we used a state-of-the-art solver for logic programs, we rigorously had to shrink 
the size of the problem due to different preprocessing steps. In contrast to other approaches, 
this does not change the problem or the solutions. We still infer all possible explanations 
for the experiments. Furthermore our approach benefits from future developments in ASP 
solving, like parallelization. 

From a broader perspective, ASP has already proved its utility for diverse biolog- 
ical applications. Among them, we find JBaral et al. 20041 IDworschak et al. 20081 
LGebser et al. 2008; 'Erde m and Tiire 20081 ISchaub and Thiele 20091 lErdem 20091 
Ibal Palii et al. 2009: .Gebser et al. 20101) . all of which treat rather different biological 
problems from what we tackled in the paper at hand. A feature common to many among 
these approaches is the exploitation of ASP's combinatorial nature in inspecting either all 
or what is common to all solutions to a biological problem. 



7 Summary 

We presented a declarative solution to the ANR problem using ASP. We support check- 
ing validity of an experiment graph, predicting the behavior of unmeasured species, and 
reconstructing all possible explanations for a given set of experiments using a partial or- 
der on the used reactions. We showed that the mathematical representation of the problem 
can be easily translated into a logic program which then can be handled by a state of the 
art grounder and solver for ASP. As the logic program can easily be split into different 
parts, also various versions of the problem (adding or relaxing some of the constraints) 
can be tackled. This is especially useful when the problem is refined to use catalysts or 



inhibitors as done in ( Durzinsky et al. 20101 or just has to be changed for special pur- 



poses. It can also be used to introduce P-Invariants as described in (Durzinsky et al. 2010[ ). 
This avoids rewriting complex programming code and automatically benefits from devel- 
opments in ASP solving. We have shown that our approach is scalable for a certain class 
of perturbation experiments and it is already used by the biology research group of Wolf- 
gang Marwan at the Magdeburg Centre for Systems Biology and in the context of the 
GoFORSYS ( jgoforsys) project. It outperforms the direct implementation of the problem 
while supporting a broader range of functionality. 

In the future we plan to extend our approach in terms of catalysts, as described 
in ( [Durzinsky et al. 20T0) . We then want to combine it with the ordering of reactions and 
also the automatic addition of species. Furthermore, we need to improve the approach to 
be capable of dealing with larger networks, as all currently tested networks are of small 
to medium size. As time series experiments are usually very costly, we want to investigate 
how to find optimal experiments to reduce the number of possible regulatory structures. 
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